Skip to content

RFC0055 Identity-Aware Routing#535

Open
rkoster wants to merge 53 commits intodevelopfrom
feature/app-to-app-mtls-routing
Open

RFC0055 Identity-Aware Routing#535
rkoster wants to merge 53 commits intodevelopfrom
feature/app-to-app-mtls-routing

Conversation

@rkoster
Copy link
Copy Markdown

@rkoster rkoster commented Mar 5, 2026

Summary

Implements Phase 1 (1a + 1b) of the App-to-App mTLS Routing RFC.

Note: This PR is a draft because the RFC for App-to-App mTLS Routing has not been approved yet.

Phase 1a: mTLS Infrastructure

  • Per-domain TLS configuration via GetConfigForClient callback
  • Domain-aware client certificate validation
  • XFCC header handling (sanitize_set mode) for mTLS domains
  • Configurable XFCC format: raw (base64 cert) or envoy (compact hash+subject)
  • BOSH job properties for router.mtls_domains

Phase 1b: Authorization

  • Identity extraction from Diego instance identity certificates
  • Authorization handler enforcing mTLS authorization rules
  • RFC-0027 compliant flat options: mtls_allowed_apps, mtls_allowed_spaces, mtls_allowed_orgs (comma-separated GUIDs), mtls_allow_any (boolean)
  • Route-registrar support for mTLS route options
  • RTR access logs emitted for denied requests (401/403)

Testing

  • Unit tests for all new handlers
  • Integration tests for end-to-end mTLS routing
  • BOSH template tests for configuration

Key Files Changed

GoRouter:

  • src/code.cloudfoundry.org/gorouter/config/config.go - MtlsDomainConfig struct
  • src/code.cloudfoundry.org/gorouter/router/router.go - GetConfigForClient callback
  • src/code.cloudfoundry.org/gorouter/handlers/clientcert.go - Domain-aware XFCC
  • src/code.cloudfoundry.org/gorouter/handlers/identity.go - XFCC parsing
  • src/code.cloudfoundry.org/gorouter/handlers/mtls_authorization.go - Authorization handler
  • src/code.cloudfoundry.org/gorouter/mbus/subscriber.go - Route message parsing
  • src/code.cloudfoundry.org/gorouter/route/pool.go - AllowedSources storage
  • src/code.cloudfoundry.org/gorouter/proxy/proxy.go - Handler wiring

Route Registrar:

  • src/code.cloudfoundry.org/route-registrar/config/config.go - AllowedSources in Options
  • src/code.cloudfoundry.org/route-registrar/messagebus/messagebus.go - NATS message format

BOSH:

  • jobs/gorouter/spec - router.mtls_domains property
  • jobs/gorouter/templates/gorouter.yml.erb - Template configuration

Configuration Example

# BOSH manifest
router:
  mtls_domains:
  - domain: "*.apps.mtls.internal"
    ca_certs: "((diego_instance_identity_ca.certificate))"
    forwarded_client_cert: sanitize_set

# Route registration
routes:
- name: my-api
  uris: ["my-api.apps.mtls.internal"]
  options:
    mtls_allowed_apps: "frontend-app-guid"
    mtls_allowed_spaces: "trusted-space-guid"

Related PRs

@rkoster
Copy link
Copy Markdown
Author

rkoster commented Apr 16, 2026

Latest Update: RFC-Compliant Post-Selection Authorization

Implemented breaking change to replace pre-selection authorization with strict post-selection enforcement per RFC lines 475-517.

Key Changes (commit cbf0695)

Architecture:

  • ✅ Composable PostSelectionHandler interface for middleware pipeline
  • ✅ Separation of pre-selection checks (SNI, route lookup, identity) from post-selection authorization
  • ✅ Immediate 403 on authorization failure (non-retriable, per RFC)
  • ✅ Post-selection scope checking with :post-selection suffix in metrics

Implementation:

  • handlers/post_selection_pipeline.go - Infrastructure for composable checks
  • handlers/mtls_scope_auth.go - Org/space boundary enforcement
  • handlers/mtls_access_rules_auth.go - Access rules evaluation (cf:app:, cf:space:, etc.)
  • handlers/mtls_pre_auth.go - Pre-selection checks only
  • handlers/mtls_auth_error.go - Custom error type with Rule/Reason/HTTPStatus

Test Coverage:

  • +44 new tests (14 scope + 17 access rules + 13 pipeline)
  • +4 integration tests for shared route scenarios
  • All 393 tests passing

RFC Compliance

Intermittent 403s - Expected for shared routes across scope boundaries (RFC-compliant)
Error messages - Include "caller org X does not match selected backend org Y"
Strict enforcement - Prevents unauthorized cross-scope access

Breaking Change

⚠️ This replaces the permissive pre-selection authorization entirely. No feature flag provided as this is a security improvement required by the RFC.

Deprecated:

  • handlers/mtls_authorization.go (old implementation with migration notes)
  • route/pool.go EndpointOrgIDs/SpaceIDs methods

Integration Test Results

All integration tests compile successfully. Shared route scenarios validate:

  • Intermittent 403s with scope=space (different spaces in same org)
  • Always succeed with scope=org (same org, different spaces)
  • Always fail with scope=org (different orgs)
  • Per-endpoint access rules with intermittent behavior

Ready for full integration test run and review.

@rkoster
Copy link
Copy Markdown
Author

rkoster commented Apr 16, 2026

Refactoring: AuthError for Future Extensibility

Commit: 4ff64b9

Renamed MtlsAuthError to AuthError to prepare for future authentication methods beyond mTLS, such as SPIFFE JWT tokens.

Changes

  • ✅ Renamed handlers/mtls_auth_error.gohandlers/auth_error.go
  • ✅ Updated struct, constructor functions, and all references
  • ✅ Changed error messages from "mTLS authorization denied" to "authorization denied"
  • ✅ Updated all test files

Benefits

  • 🔮 Future-proof: Ready for SPIFFE JWT token authentication
  • 🏗️ Generic design: Error type not tied to specific auth mechanism
  • 🧩 Reusable: Can be used across different authentication methods
  • Clean: Better naming convention for authorization errors

No functional changes - pure refactoring for extensibility.

@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch 3 times, most recently from 1f9b804 to 79271b7 Compare April 17, 2026 12:12
@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch from 5cc4170 to b875867 Compare April 20, 2026 09:18
rkoster added 21 commits April 23, 2026 13:19
- Rename mtls_app_to_app_test.go -> identity_aware_routing_test.go
- Update main test suite: 'App-to-App mTLS Routing' -> 'Identity-Aware Routing'
- Rename registerWithAllowedSources() -> registerWithAccessRules()
- Rename registerWithScopeAndAllowedSources() -> registerWithScopeAndAccessRules()
- Update parameter names: allowedSources -> accessRules
- Update test descriptions: 'mtls_allowed_sources' -> 'access rules'
- Update comments to reflect access-rules terminology

This aligns with the RFC's positioning of the feature as 'identity-aware
routing' with access rules, rather than mTLS-specific 'allowed sources'.
…ction auth

Remove EndpointOrgIDs() and EndpointSpaceIDs() which collected org/space
IDs from all endpoints in a pool. These were used by the deprecated
pre-selection authorization approach.

Post-selection authorization (RFC-compliant) checks org/space boundaries
against the SELECTED endpoint's tags, not against all endpoints in the pool,
making these aggregation methods unnecessary.
Route-registrar is used by BOSH-deployed system components (CC, UAA, etc.)
to register their routes. These system components:
- Don't have CF app identities (no Diego instance identity certs)
- Don't use mTLS domains with access control enforcement
- Are out of scope for the app-to-app identity-aware routing RFC

Only Cloud Controller → Diego → NATS should send access_scope/access_rules
for actual CF app routes. Route-registrar doesn't need these fields.
Devbox is used for local development environment setup but should not
be tracked in the repository. The files remain in the working directory
for developers who use devbox.
routing-api is a local submodule in src/code.cloudfoundry.org/routing-api,
not an external dependency. It should not be listed in go.mod.

This was incorrectly added during rebase conflict resolution.
- Export MakeEndpoint method for test access
- Fix test calls to MakeEndpoint with correct parameters
- Remove unnecessary fmt.Sprintf for string argument
This commit fixes all failing integration tests for the identity-aware
routing feature by addressing three critical issues:

1. mTLS client certificate trust chain issue
   - Tests were creating instance identity certs with a different CA than
     the one configured in GoRouter's mTLS domain settings
   - Added CreateInstanceIdentityCertWithCA() helper that accepts an
     existing CA to ensure proper trust chain
   - Updated all test cases to use the shared mtlsDomainCA

2. Authorization errors returning HTTP 502 instead of 403
   - Added custom ErrorHandler to httputil.ReverseProxy that checks for
     AuthError and returns the appropriate HTTP status code (403)
   - Previously all transport errors defaulted to 502 Bad Gateway

3. Per-endpoint access rules not working correctly
   - Authorization handler was checking pool-level access rules (first
     endpoint only) instead of the selected endpoint's rules
   - Changed to use endpoint.AccessRules to support different backends
     with different authorization requirements on the same route

4. Default-deny not enforced for routes without access rules
   - Changed enforcement logic to apply to all requests with CallerIdentity,
     regardless of AccessScope setting

5. SNI/Host header mismatch in test requests
   - Added newMtlsGetRequest() helper with custom DialTLSContext that
     connects to 127.0.0.1 while preserving hostname for TLS SNI
   - Updated all identity-aware routing tests to use this helper

Test results: 20/20 integration tests passing, 17/17 unit tests passing
This prevents the subscriber's ClosedCB from firing log.Fatal when
NATS is stopped first, which was causing the test process to exit
prematurely and leading to port binding conflicts in parallel test
runs.

The cleanup order is now:
1. Terminate gorouter session
2. Stop NATS server
3. Clean up test files

This matches the fix from upstream PR #555 (commit b2bf830) which
resolved similar issues in router/router_test.go.
Apply comprehensive terminology rebranding from 'access rules' to 'route policies'
across the gorouter codebase to align with Cloud Foundry's existing 'network policies'
convention. This matches the terminology changes from GitHub PR #1438 commit be8d74c1.

Key terminology changes:
- Types: MtlsAccessRulesAuth → MtlsRoutePoliciesAuth
- Functions: evaluateAccessRules() → evaluateRoutePolicies()
- Functions: parseCommaSeparatedSelectors() → parseCommaSeparatedSources()
- Constants: AccessScopeAny/Org/Space → RoutePolicyScopeAny/Org/Space
- Struct fields: AccessScope → RoutePolicyScope, AccessRules → RoutePolicies
- JSON tags: access_scope → route_policy_scope, access_rules → route_policy_sources
- Error messages: route:no_access_rules → route:no_route_policies
- Error messages: route:access_rules → route:route_policies
- Comments: 'access rules' → 'route policies', 'selector' → 'source'

Files modified:
- Core: route/pool.go, mbus/subscriber.go
- Handlers: mtls_route_policies_auth.go (renamed), mtls_pre_auth.go, mtls_scope_auth.go, proxy/proxy.go
- Tests: All corresponding test files updated

All unit tests passing (972 specs total):
- handlers: 360/360 passed
- route: 252/252 passed
- mbus: 8/8 passed (subset)
- proxy: 376/376 passed
Fix struct field alignment in test files to pass CI gofmt validation.
Complete the terminology rebrand by updating integration test helper
functions to use RoutePolicyScope and RoutePolicySources instead of
the old AccessScope and AccessRules field names.

Fixes go vet error: unknown field AccessScope in struct literal
This commit fixes a critical bug where route policies were incorrectly
enforced on ALL routes (including public routes and mTLS routes without
--enforce-route-policies flag), causing 403 Forbidden errors.

Root Cause:
- MtlsRoutePoliciesAuth.Check() only checked CallerIdentity == nil to
  decide whether to skip enforcement
- It did NOT check routePolicyScope (which indicates if enforcement is
  actually enabled for the domain)
- This caused enforcement on routes where it should be skipped

Fix:
- Add routePolicyScope check at the beginning of Check() method
- Only enforce route policies when routePolicyScope != "" (enforcement
  enabled via --enforce-route-policies flag on domain creation)
- This mirrors the pattern already used in MtlsScopeAuth handler

Impact:
- Public routes (non-mTLS domains): No longer incorrectly rejected
- mTLS routes WITHOUT --enforce-route-policies: No longer rejected
- mTLS routes WITH --enforce-route-policies: Still correctly enforced

Test Coverage:
- Added regression test for the specific bug scenario:
  RoutePolicyScope empty + CallerIdentity present
- This test would have caught the bug if it existed originally
- All 18 MtlsRoutePoliciesAuth tests now pass
- Test suite now covers all skip scenario combinations

Files Changed:
- src/code.cloudfoundry.org/gorouter/handlers/mtls_route_policies_auth.go
- src/code.cloudfoundry.org/gorouter/handlers/mtls_route_policies_auth_test.go
HTTP clients that include explicit ports in URLs (e.g., https://app.example.com:443/)
result in Go's http.Request.Host containing the port (app.example.com:443).

Previously, GetMtlsDomainConfig() did not strip the port before matching against
configured mTLS domains (e.g., *.apps.identity), causing:
- Domain matching to fail for requests with explicit ports
- No XFCC header added (fell back to default behavior)
- Identity extraction failure in CallerIdentity
- Pre-auth handler denying requests with 403 and reason "identity-extraction-failed"

This particularly affected Java Spring Boot HTTP clients which construct URLs
with explicit ports by default.

Fix: Use net.SplitHostPort() to strip port before domain matching, ensuring
consistent behavior regardless of whether clients include explicit ports.

Added comprehensive unit tests covering:
- Wildcard domain matching with/without ports
- Exact domain matching with/without ports
- IsMtlsDomain() function with/without ports
- Negative test cases for non-mTLS domains
@rkoster rkoster force-pushed the feature/app-to-app-mtls-routing branch from 8467a4a to ac790f0 Compare April 23, 2026 13:20
The test was incorrectly expecting 403 Forbidden when a route is registered
on an mTLS domain without route policy enforcement enabled. The correct
behavior is to allow the request through (200 OK) and let the backend
handle authorization.

Route policy enforcement is controlled by Cloud Controller via the
RoutePolicyScope field. When RoutePolicyScope is empty (enforcement disabled),
GoRouter allows authenticated requests through. Default-deny only applies when
enforcement IS enabled but no policies are configured.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants